30 research outputs found

    A Theoretical and Practical Framework for Evaluating Uncertainty Calibration in Object Detection

    Full text link
    The proliferation of Deep Neural Networks has resulted in machine learning systems becoming increasingly more present in various real-world applications. Consequently, there is a growing demand for highly reliable models in these domains, making the problem of uncertainty calibration pivotal, when considering the future of deep learning. This is especially true when considering object detection systems, that are commonly present in safety-critical application such as autonomous driving and robotics. For this reason, this work presents a novel theoretical and practical framework to evaluate object detection systems in the context of uncertainty calibration. The robustness of the proposed uncertainty calibration metrics is shown through a series of representative experiments. Code for the proposed uncertainty calibration metrics at: https://github.com/pedrormconde/Uncertainty_Calibration_Object_Detection.Comment: Pre-prin

    A Probabilistic Approach for Human Everyday Activities Recognition using Body Motion from RGB-D Images

    Get PDF
    In this work, we propose an approach that relies on cues from depth perception from RGB-D images, where features related to human body motion (3D skeleton features) are used on multiple learning classifiers in order to recognize human activities on a benchmark dataset. A Dynamic Bayesian Mixture Model (DBMM) is designed to combine multiple classifier likelihoods into a single form, assigning weights (by an uncertainty measure) to counterbalance the likelihoods as a posterior probability. Temporal information is incorporated in the DBMM by means of prior probabilities, taking into consideration previous probabilistic inference to reinforce current-frame classification. The publicly available Cornell Activity Dataset [1] with 12 different human activities was used to evaluate the proposed approach. Reported results on testing dataset show that our approach overcomes state of the art methods in terms of precision, recall and overall accuracy. The developed work allows the use of activities classification for applications where the human behaviour recognition is important, such as human-robot interaction, assisted living for elderly care, among others

    Intelligent Robotic Perception Systems

    Get PDF
    Robotic perception is related to many applications in robotics where sensory data and artificial intelligence/machine learning (AI/ML) techniques are involved. Examples of such applications are object detection, environment representation, scene understanding, human/pedestrian detection, activity recognition, semantic place classification, object modeling, among others. Robotic perception, in the scope of this chapter, encompasses the ML algorithms and techniques that empower robots to learn from sensory data and, based on learned models, to react and take decisions accordingly. The recent developments in machine learning, namely deep-learning approaches, are evident and, consequently, robotic perception systems are evolving in a way that new applications and tasks are becoming a reality. Recent advances in human-robot interaction, complex robotic tasks, intelligent reasoning, and decision-making are, at some extent, the results of the notorious evolution and success of ML algorithms. This chapter will cover recent and emerging topics and use-cases related to intelligent perception systems in robotics

    Towards multimodal affective expression:merging facial expressions and body motion into emotion

    Get PDF
    Affect recognition plays an important role in human everyday life and it is a substantial way of communication through expressions. Humans can rely on different channels of information to understand the affective messages communicated with others. Similarly, it is expected that an automatic affect recognition system should be able to analyse different types of emotion expressions. In this respect, an important issue to be addressed is the fusion of different channels of expression, taking into account the relationship and correlation across different modalities. In this work, affective facial and bodily motion expressions are addressed as channels for the communication of affect, designed as an emotion recognition system. A probabilistic approach is used to combine features from two modalities by incorporating geometric facial expression features and body motion skeleton-based features. Preliminary results show that the presented approach has potential for automatic emotion recognition and it can be used for human robot interaction

    Dynamic Bayesian Network for Time-Dependent Classification Problems in Robotics

    Get PDF
    This chapter discusses the use of dynamic Bayesian networks (DBNs) for time-dependent classification problems in mobile robotics, where Bayesian inference is used to infer the class, or category of interest, given the observed data and prior knowledge. Formulating the DBN as a time-dependent classification problem, and by making some assumptions, a general expression for a DBN is given in terms of classifier priors and likelihoods through the time steps. Since multi-class problems are addressed, and because of the number of time slices in the model, additive smoothing is used to prevent the values of priors from being close to zero. To demonstrate the effectiveness of DBN in time-dependent classification problems, some experimental results are reported regarding semantic place recognition and daily-activity classification

    Affective facial expressions recognition for human-robot interaction

    Get PDF
    Affective facial expression is a key feature of nonverbal behaviour and is considered as a symptom of an internal emotional state. Emotion recognition plays an important role in social communication: human-to-human and also for humanto-robot. Taking this as inspiration, this work aims at the development of a framework able to recognise human emotions through facial expression for human-robot interaction. Features based on facial landmarks distances and angles are extracted to feed a dynamic probabilistic classification framework. The public online dataset Karolinska Directed Emotional Faces (KDEF) [1] is used to learn seven different emotions (e.g. angry, fearful, disgusted, happy, sad, surprised, and neutral) performed by seventy subjects. A new dataset was created in order to record stimulated affect while participants watched video sessions to awaken their emotions, different of the KDEF dataset where participants are actors (i.e. performing expressions when asked to). Offline and on-the-fly tests were carried out: leave-one-out cross validation tests on datasets and on-the-fly tests with human-robot interactions. Results show that the proposed framework can correctly recognise human facial expressions with potential to be used in human-robot interaction scenario

    TReR: A Lightweight Transformer Re-Ranking Approach for 3D LiDAR Place Recognition

    Full text link
    Autonomous driving systems often require reliable loop closure detection to guarantee reduced localization drift. Recently, 3D LiDAR-based localization methods have used retrieval-based place recognition to find revisited places efficiently. However, when deployed in challenging real-world scenarios, the place recognition models become more complex, which comes at the cost of high computational demand. This work tackles this problem from an information-retrieval perspective, adopting a first-retrieve-then-re-ranking paradigm, where an initial loop candidate ranking, generated from a 3D place recognition model, is re-ordered by a proposed lightweight transformer-based re-ranking approach (TReR). The proposed approach relies on global descriptors only, being agnostic to the place recognition model. The experimental evaluation, conducted on the KITTI Odometry dataset, where we compared TReR with s.o.t.a. re-ranking approaches such as alphaQE and SGV, indicate the robustness and efficiency when compared to alphaQE while offering a good trade-off between robustness and efficiency when compared to SGV.Comment: This preprint has been submitted to 26th IEEE International Conference on Intelligent Transportation Systems ITSC 202

    Multimodal Bayesian Network for Artificial Perception

    Get PDF
    In order to make machines perceive their external environment coherently, multiple sources of sensory information derived from several different modalities can be used (e.g. cameras, LIDAR, stereo, RGB-D, and radars). All these different sources of information can be efficiently merged to form a robust perception of the environment. Some of the mechanisms that underlie this merging of the sensor information are highlighted in this chapter, showing that depending on the type of information, different combination and integration strategies can be used and that prior knowledge are often required for interpreting the sensory signals efficiently. The notion that perception involves Bayesian inference is an increasingly popular position taken by a considerable number of researchers. Bayesian models have provided insights into many perceptual phenomena, showing that they are a valid approach to deal with real-world uncertainties and for robust classification, including classification in time-dependent problems. This chapter addresses the use of Bayesian networks applied to sensory perception in the following areas: mobile robotics, autonomous driving systems, advanced driver assistance systems, sensor fusion for object detection, and EEG-based mental states classification

    HMAPs - Hybrid height-Voxel maps for environment representation

    Get PDF
    This paper presents a hybrid 3D-like grid-based mapping approach, that we called HMAP, used as a reliable and efficient 3D representation of the environment surrounding a mobile robot. Considering 3D point-clouds as input data, the proposed mapping approach addresses the representation of height-voxel (HVoxel) elements inside the HMAP, where free and occupied space is modeled through HVoxels, resulting in a reliable method for 3D representation. The proposed method corrects some of the problems inherent to the representation of complex environments based on 2D and 2.5D representations, while keeping an updated grid representation. Additionally, we also propose a complete pipeline for SLAM based on HMAPs. Indoor and outdoor experiments were carried out to validate the proposed representation using data from a Microsoft Kinect One (indoor) and a Velodyne VLP-16 LiDAR (outdoor). The obtained results show that HMAPs can provide a more detailed view of complex elements in a scene when compared to a classic 2.5D representation. Moreover, validation of the proposed SLAM approach was carried out in an outdoor dataset with promising results, which lay a foundation for further research in the topic
    corecore